The Makita Drill

Author

Giovanni Forchini

Published

January 26, 2017

Data Scientist: The Sexiest Job of the 21st Century” was the title of an HBR article from 2012. It is unclear what a data scientist is, but it seems to be an individual who uses data to solve problems. In this sense, econometricians are data scientists.

Using data to solve problems involves the use of mathematics, statistics, programming, quantitative reasoning and initiative. Although many people decide to adopt the title of ‘data scientist’, I believe these skills are and will continue to be in short supply for the foreseeable future, at least based on my experience in the UK and Australia.

In the 20th Century, one would generally teach statistics and econometrics using mathematical tools such as matrices, vectors, linear algebra, calculus, etc. These are now rarely used outside PhD level. Knowledge of mathematics and technical concepts among statistics and econometrics students is very limited. Explanations are often replaced by ‘hand waving’. Intuition, which used to supplement the understanding of a result or a technique, has become the ‘understanding’ - some of my more inquisitive students have found the lack of detail in the treatment of key results frustrating. In addition, the mathematics used is often taught in a very procedural way which does not emphasise conceptual understanding. In the past, universities used to encourage problem solving, but this is often no longer considered a priority. Teachers are pressured not to fail students and to make sure that a large proportion of students is awarded top marks. Learning to solve problems involves both effort and possible failure and this is no longer acceptable - see the THES article “Prepare students ‘to fail’ so they can learn, report suggests”. In a 2012 RSA report, it is stated that “English universities are sidelining quantitative and mathematical content because students and staff lack the requisite confidence and ability”. Students are, therefore, often not enabled to build on their technical knowledge and to tackle non-routine problems.

This tendency to avoid technical details appears to be becoming more pervasive in society as a whole. The same 2012 RSA report remarked that in Japan and China, more than 50% of degrees are awarded in STEM subjects, but this percentage reduces to less than 25% in the UK and only 16% in the US. Similar concerns have been raised in Australia. An article in THES in December 2016 emphasises that ‘reading popular science articles causes non-scientists to overrate their expertise’. LinkedIn groups on data science are full of articles with fancy titles, lots of waffle and little substance that are authored by individuals who self proclaim to be evangelists, pioneers, leaders, champions etc. Similarly, job specs for data scientist are full of the latest buzzwords but lack content. Skills required are often listed as R, SAS, Python, Tableau, etc. However, these are not skills but tools - it is as if a builder has the skills to use a Makita drill, but not a different brand. If one understands concepts and methods, it takes close to no time to choose the right tool for the job and to learn the correct way of using it. On the other hand, knowing how to use a tool does not guarantee that the analysts know what they are doing: the builder may know how to operate the Makita drill but may have no clue about where to drill or what drill bits to use, and, as we all know, this leads to expensive mistakes and wasted time.

A simple look at any Kaggle competition shows that many competitors can reach good positions in the leaderboard by arbitrarily changing a few values in a previously publicly shared procedure. They can use the tools, but they are just guessing where to drill. By chance, they may find a wall stud but they may also drill through a water pipe or an electric cable or may irresponsibly fix a heavy load bearing shelf on a plasterboard wall. Do these individuals have the skills of a data scientist? Are these the type of data scientists that add value to a company, a business, a research project or society in general? Is this the type of data science we should teach and encourage? Is this type of data science really the sexiest job of the 21st Century?